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The quantum many-body problem of predicting prop¬ 
erties of systems containing electrons or other fermionic 
entities has challenged physicists and chemists for 
decades. This is so because the minus sign associated 
with fermionic exchange creates a host of difficulties in¬ 
cluding long-ranged entanglement and a Monte-Carlo 
sign problem. The net effect is to place the generic 
fermion many-body problem in the class of problems 
whose full solution is exponentially hard. Although new 
developments such as matrix product and tensor network 
methods may provide solutions to ground-state proper¬ 
ties with only power-law cost, search for efficient approx¬ 
imate methods to handle a wide range of phenomena at 
a wide range of temperatures remains a key goal of con¬ 
densed matter physics and quantum chemistry. 

In this work, we investigate the use of Machine Learn¬ 
ing (ML) [1] to leverage existing results and provide an 
efficient approximate solution to a generic class of prob¬ 
lems in quantum many-body physics. ML is in essence 
a way to use a database of known solutions to infer in¬ 
formation about a new problem. In the condensed mat¬ 
ter physics context it has been used as an intermediate 
step in molecular dynamics calculations [2-7], to predict 
density functionals (so far only in the ID context) [8], to 
obtain transmission coefficients for electron transport [9] 
and, very recently, to predict the fermi level density of 
states of weakly correlated solids [10] and find forma¬ 
tion energies of materials [11]. These applications re¬ 
late to classical physics and to single-particle quantum 
mechanics. In the quantum chemistry context, ML has 
been successfully applied to predict energies and other 
scalar properties of molecules [12-20]. In addition, non- 
ML ideas from data science have been recently proposed 
as ways to help the solution of the non-equilibrium many- 
body problem [21]. 

We propose to use machine learning methods to solve 


true quantum many-body problems. A technical issue 
arises: in many applications of machine learning, includ¬ 
ing most of the ones referred to above, the goal is to 
infer a scalar property (e.g. an energy) of a model speci¬ 
fied by a modest number of scalar parameters. However, 
the generic quantum many-body problem is the solution 
of a functional equation relating an input function (for 
example a bare electron Green’s function) to an output 
function (e.g. an electron self-energy). Here we build 
on previous work [22] which involved learning a func¬ 
tion specified by a modest number of input parameters, 
to develop a formalism capable of solving the more gen¬ 
eral problem of mapping a function to a function. In in¬ 
dependent contemporaneous work, questions related to 
learning functions have been studied in the context of 
density functional theory where one seeks to learn the re¬ 
lation between a position-dependent charge density and 
an exchange-correlation potential[23, 24]. 

The context for our work is Dynamical Mean-Field 
Theory (DMFT)[25], a widely used approximate method 
for determining the properties of materials with strong 
electronic correlations. DMFT approximates the solu¬ 
tion to an interacting fermion system in terms of the 
solution of an auxiliary quantum impurity problem. The 
impurity model, although simpler than the full prob¬ 
lem, is still a quantum many-body problem. The im¬ 
purity model is specified by a hybridization function 
A(a;); the many-body physics by a local Green function 
G(cli) or self-energy S(w). The hybridization function it¬ 
self is obtained from a self-consistency condition which 
involves S(a;) and an initial band structure which en¬ 
codes the chemistry and crystal structure of the material 
in question and may be parametrized as a bare or ini¬ 
tial hybridization function A°(a;). In standard applica¬ 
tions, the self-consistency condition is solved by iteration. 
One may imagine a ML process to solve the impurity 
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model (relating A and G/E) or a ML process to solve 
the entire DMFT self-consistency loop (relating A° and 
Af,G-f /E^). In this paper, we only present results for 
the full solution of DMFT. Use of ML as impurity solver 
may also be valuable as an intermediate step, enabling 
the rapid construction of a database of solved problems 
in the real-materials context. Our formalism is general 
enough to apply to this possibility. 

We test our methods using the Hubbard model 
defined on a three dimensional cubic lattice with 
first and second-neighbor hoppings, with Hamilto¬ 
nian iL = ^J')cl^Cka + Here 

/i is the chemical potential and Sk = —2t X]q=i {ka) — 
4t' [cos (fci) cos (^ 2 ) + cos (fci) cos (ks) + cos (^ 2 ) cos (ks)]. 
The bare hybridization function is A°(a;) = 

CO + ■ We define energy units 

such that the full non-interacting bandwidth W = 2 
where W — 12t if jt'j < t/4 and W = St + ISjt'j if 
jt'l > </4. Varying the ratio t'/t changes the structure in 
the density of states, in particular shifting the location 
of the density of states peaks relative to the band 
center (see Section I of the supplemental material for 
examples). 

We seek a machine that enables us to map a A° to an 
output local Green function or self-energy. DMFT ad¬ 
mits two classes of solutions: metallic ones with a non¬ 
vanishing density of states at the fermi level and a smooth 
self-energy, and Mott insulating solutions with a gap at 
the fermi level due to Coulomb repulsion and (in many 
cases) a self-energy with a pole near the chemical poten¬ 
tial. We have found it advantageous to introduce a bi¬ 
nary classification step that identifies a given solution as 
metallic or insulating and to use two different machines 
to determine the properties of the two kinds of solutions. 
For classification, we use the entire database minus one 
as training and the one remaining as the testing prob¬ 
lem. We then repeat for all members of the database. 
We tested three different ML for classification: simple 
support vectors machine svm[28] with ^ 96% accuracy, 
neural networks [29] with ^ 97% accuracy and decision 
forests[30] with ~ 99.6% accuracy. The only misplaced 
problems are critical metals extremely close to the tran¬ 
sition. We only kept the decision forest as it outper¬ 
formed the two others. Once the state of a new problem 
has been decided, the Kernel Ridge Regression (KRR) 
method [1, 22] (more details follow) is employed to de¬ 
termine the solution using the sub-databases containing 
only metal or Mott insulating solutions. The full ML 
process for DMFT is shown in Fig. 1 while some details 
about the parameters are explained later in the text. 

The first step in implementing machine learning is 
to generate a database of initial conditions, in other 
words a set of bare hybridization functions that span 
a range of physically reasonable possibilities. We con¬ 
sider the set of hybridization functions defined by t' = 


[0, —O.lt, —0.2t, —0.3t] (the case of positive t'/t could 
be accounted for by considering electron doping) and 
fj, = 0. Sections I and H.A of the supplemental ma¬ 
terial give more details. We then obtain the database 
of solved problems by using the exact diagonalization 
(ED) method [25-27] to solve the single-site dynami¬ 
cal mean field approximation for interaction strengths 
in the range 0.16 < U < 4 and densities in the range of 
0.6 < Ud < 1.05. Particularities of the ED database are 
discussed in Section H.B of the supplemental material. 



Figure 1: (Color online) Schematic view of DMFT as seen in a 
machine learning perspective. From an input description of a 
problem we are seeking a solution, the ML chooses hrst if the 
solution is metallic or insulating. Then the ML predicts the 
solution for the correlation function of choice by predicting the 
coefficients of the Legendre polynomials expansion of either 
the Green’s function or self-energy. In the case of the self¬ 
energy for the metal, the ML predicted quasi-particle weight 
Z can also be extracted. 

The second step in implementing machine learning is 
the construction of a representation of the information 
to be learned and of the descriptor D, a unique identifier 
of a problem. Our input and output data are functions. 
Functions may be specified as a vector of coefficients in a 
space of basis functions (pm (e.g. E( 2 ;) = 

Our previous work[22], following work by Boehnke et 
a7[31] found that Legendre polynomials were a very ef¬ 
ficient choice of basis, so we adopt this representation 
here. The Legendre representation is most naturally for¬ 
mulated in imaginary time 0 < r < /3 with /3 the in¬ 
verse temperature and hence a correlation function is 
f{T) = where Pi are the Leg¬ 

endre polynomials. The Fourier transform to f(iuin) can 
be done analytically[31]. The representation is general, 
we fit either the local Green’s function or the self-energy 
as shown in Fig. 1 (or even the hybridization function). 
See Section IV of the Supplementary material for details. 

The descriptor consists of the input function (hy¬ 
bridization function) plus a few scalar parameters; 
we denote the expansion coefficients of the func¬ 
tion as / and the scalar parameters U (interaction 
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strength) and n (chemical potential) such that D = 
[(/i, / 2 , ■ ■ ■, fN)input, U, fj] (see Fig. 1). Note that both 
the full DMFT problem and the impurity solving part 
are the same problem as far as ML is concerned, the only 
difference being what database one chooses. The exact 
diagonalization method used here provides a representa¬ 
tion of the input hybridization function in terms of bath 
level energies {Smi} and hybridization parameters {Vmi} 
(m labels entries in the database and I labels the different 
bath energies and hybridization parameters for a given 
entry in the database) so in practice we use these for the 
fm- We have also implemented machine learning using 
the representation of the input function in terms of Leg¬ 
endre polynomials, with essentially identical results (see 
Section V of the supplementary material). Section 11.A 
of the supplemental material shows how the bare ED pa¬ 
rameters are obtained from a known band structure. 

Machine learning then estimates the solution f{z)^ 
/ = (/i; /2, ■ • ■, fN)output of a new problem in terms of an 
interpolation between known solutions. We use KRR, an 
expansion in the abstract multidimensional space of de¬ 
scriptors (each point D of this space represents a unique 
problem and the distance between two points is the dis¬ 
tance metric), obtaining 

{/m}~^aimi^m(D,,D), (1) 

Im 

where I labels points in the dataset, m labels entries in 
the output vector and the kernel iL is a function whose 
main characteristic is to weight most heavily the contri¬ 
butions of I for which D; is close to D. As in [22], we use 
the weighted exponential kernel, and use the Manhattan 
distance between D; and D (both are defined in Sec¬ 
tion III.A of the supplemental material). The expansion 

coefficients aim are a = (^K + XI^ /[22], where a is a 

matrix containing all the aim, K is the kernel matrix and 
A is a regularization parameter. A and the free parameter 
of the kernel are chosen using standard cross-validation, 
see Section III. A of the supplemental material. In partic¬ 
ular, as also found in [18], we found that the actual value 
chosen for A is not really important. This formalism is 
very general and could be applied to the learning of other 
types of functions. 

As first two tests of our predictive power we 
present scalar properties, the quasi-particle weight 
Z = (1 — and the lattice density of 

electrons riLattice = -2/7r/°^ IniG'iattice(fc, w) = 
—2GLattice{T = j5~) as predicted from reconstructed cor¬ 
relation functions with ML obtained Legendre polyno¬ 
mial coefficients. We estimate Z from a quadratic fit 
to the values of the reconstructed self-energies at the 
three lowest Matsubara frequencies (see for example [32] 
for why Z can be estimated on imaginary axis). As easily 
seen from Fig. 11 of [31], values of G'(iw„) (or E(iw„)) 
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Figure 2: (Color online) Machine Learning predicted quasi¬ 
particle weight Z (black circles) as compared to the exact 
results (red dots) as a function of filling of the impurity for 
different U and t' (1) U = 0.64 t’ — —0.3t, (2) U = 1.44 
t' = —O.lt, (3) U = 2.08 t' = 0. Inset: Median relative 
difference as a function of the size of the learning set 



Figure 3: (Color online) Machine Learning predicted lattice 
density (black circles) as compared to the exact results (red 
dots) as a function of the chemical potential (p) for different 
U and t' (1) U = 0.64 t' = -0.2t, (2) U = 1.44 t' = 0, (3) 
U = 2.08 t' = —O.lt (axis shifted fi - 1 - 0.2). Inset: Median 
relative difference as a function of the size of the learning set 


for the first few a;„ are given solely by the first few coeffi¬ 
cients of the expansion in Legendre polynomials. Hence, 
the prediction of Z shows how well the first few coef¬ 
ficients are learned. The results are shown in Fig. 2 for 
typical values of interaction from weak to correlated met¬ 
als and for different t'. The predictions for these specific 
D from the database are obtained by using all other ex¬ 
amples as the training set. The predictions are in gen¬ 
eral very good with a slightly worst predicting power for 
larger correlation close to half-filling where Z is close to 
zero. To study the error in a more rigorous way, we 
present in inset of Fig. 2 what we call the median relative 
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difference (MRD) for Z as a function of the learning set 
size (see supplemental material Section VI for details). 
This shows the median value of predictions for fifty dif¬ 
ferent random examples each re-calculated with twenty 
different random learning set. As shown in the inset of 
Fig. 2, the MRD of Z is slightly below 1% for a small size 
of 500 and gets to around 0.1% for the largest learning 
set. A predictive power of smaller than 1% error even 
for a small database is very interesting especially since 
choosing completely random datasets is the worst case 
scenario. 

In the case of the lattice density of electrons as a func¬ 
tion of chemical potential, the ML path from Fig. 1 is 
the one where we learn the Gi’s of the expansion of the 
local lattice Green’s function then reconstruct it in imag¬ 
inary time. Since Pi{l) = 1 for all I, the density is 
n Lattice = -f V^^T+TGo Therefore, contrary to 

the case of Z, the prediction of the density uses all pre¬ 
dicted coefficients of the expansion. We show results in 
Fig. 3 for typical parameters, yet different than those 
presented for Z. To improve readability, we shifted curve 
(3) by 0.2. Once again the results are in good agreement 
with slightly worst predictions for ULatUce > 1- This re¬ 
gion tends also to be more problematic for Z. This is not 
fundamental but rather because our DMFT database is 
not as well constructed there. In the inset of Fig. 3 we 
show the MRD calculated the same way as for Z. ML 
does even better in this case where the MRD is at worst 
- 0.25%. 

We now show in Fig. 4-(a) and -(b) the prediction of 
the imaginary part of the impurity Green’s function in 
Matsubara frequency for two typical set of parameters. 
As can be seen, ML does a very good job at predicting 
both the metal and the insulator, although the number 
of insulating solutions in the database is not very large. 
In the inset of Fig. 4-(a), we present the average relative 
difference (ARD) for the metallic case as a function of the 
size of the learning set. The ARD was defined in [22] as a 
way to measure on average the accuracy of the prediction 
of a full function using only one number. The values are 
obtained by averaging predictions for many random test 
sets. The global average prediction of a full function in 
the metallic case has an error in the worst case of ~ 0.8% 
which shows the predictive power of our ML scheme. 

We finally analyse the question of prediction of a to¬ 
tally new problem and the importance of training. Be¬ 
cause our database is very homogeneous, for out of 
database predictions, we chose to use a width (arbitrarily 
set to be bW = 10) larger than the actual lowest possible 
error in the cross-validation training used for previous re¬ 
sults to avoid overfitting. In the supplemental material 
(Section III.B.), we show how overfitting influences the 
predictive power of our ML approach. We show in Fig. 5 
that indeed we can very well predict DMFT solution for 
new problems sharing no equal values of U, t' and /r in the 
database by choosing as an example t' = —0.16t, U = 2 




Figure 4: (Color online) Machine Learning predicted impurity 
Green’s function (black circles) as compared to the exact re¬ 
sults (red dots) (a) U = 2.24, t' = 0, Ud ~ 0.92, (b) U = 3.68 
t' = —0.2t, Ud = 1. (c) ARD as a function of the size of the 
learning set for the metallic phase. 



Figure 5: (Color online) Machine Learning predicted imag¬ 
inary part of the impurity Green’s function (black circles) 
as compared to the exact results (red dots) for [7 = 2, 
t' = —0.16t, Ud ~ 0.85. 
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and fj, such that Ud ~ 0.85. We trained a machine with 
the full database of 1783 metallic solutions, the metal 
being well predicted by decision tree classification. By 
this process (larger width), we loose some of the predic¬ 
tive precision we had for intra-database testing sets, but 
this is of no consequences as once we prove that we can 
accurately train a machine, what really matters is the 
predictive power for out of database unsolved problems. 

In this paper, we have investigated how machine learn¬ 
ing can be used in many-body physics as a method to 
predict correlation functions. We applied the scheme to 
DMFT and showed that we can accurately predict its 
solutions. Our approach maps input functions to out¬ 
put functions and can be applied without any changes 
as an impurity solver for DMFT rather than to learn 
the fully converged solution or for any other cluster em¬ 
bedding theory with a self-consistency relation. Impu¬ 
rity solving might be the best way to use ML for real 
materials predictions since accuracy depends largely on 
having a large database. It is also general enough to 
be applied to other problems where learning a function 
is important. The learning of a function using Kernel 
Ridge Regression might be improved by adding a simple 
form of constraints in the minimization problem at small 
computational cost [33]. In real materials applications a 
more complicated system has to be taken into account; 
Hunds coupling, multi-band etc. However, our presented 
approach for ML is general enough to be adapted. 
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3138 and A.J.M. by DOE EG-ER04169. L.-E.A. thanks 
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Supplemental Material for Machine 
learning for many-Body physics: efficient 
solution of dynamical mean-field theory 


I. DENSITY OF STATES FOR THE SIMPLE 
CUBIC LATTICE 


The dispersion relation for the the single band tight- 
binding simple cubic lattice with nearest and next- 
nearest hopping is given by 

3 

Ek = —^ cos {ka) — 4:t' 

a=l 

-I-cos (fci) cos (fca)-I- COS (fe) cos (/ca) , 

( 1 ) 


cos (fci) cos (^ 2 ) 



where —tt < fca < tt a = 1, 2, 3 labels the three Carte¬ 
sian directions of the nearest neighbor bonds of the cubic 
lattice. The bandwith W = ,r] or [o.tt.tt] ~ £[0,0,0] is 

given by 


W = 


12t 

8t-H 16|t'| 


for \t'\ < </4 , 
for \t'\ > t/A . 


( 2 ) 


As mentioned in the main text, we define the energy 
unit by fixing W = 2. This fixes the value of t for 
the different t'. Our database contains data for t' = 
[0, — O.lt, — 0.2t, — 0.3t]. We also tested our predictive 
power by using a lattice with t' = —0.16t. We show 
in Fig. 1. the density of states 


Nq{uj) = - Sk) (3) 

k 


of these five lattices to show the effect of next neighbor 
hopping. 


II. DETAILS OF THE EXACT 
DIAGONALIZATION SOLVER 


Figure 1: (Color online) Non-interacting density of states for 
the three dimensional simple cubic lattice with different val¬ 
ues of the next nearest neighbor hopping t' — 0 (black solid 
curve), t' = —O.lt (blue dash-dot curve), t' = —0.16t (black 
dashed curve), t' = —0.2t (red dot curve) and t' = — 0.3t 
(magenta solid curve and filled black circle). 


found by fitting Eq. (4) to the target function Gimp,o 
obtained via the lattice Green’s function = 

G~^^Q{ioJn) — S(fa;„). This is achieved by defining a 
distance function and minimizing it. 


d = 


1 


N 


N + 1 


J2W{uJr, 


n—0 


p-1 

^imp,0 


(iujn) - 


. 

where the function VF(w„) is chosen to give more weight 
to some frequencies if wanted. We use W{uJn) = to 
have a better fit of the low frequencies. N is the max¬ 
imum number of frequency used to define d. What is 
important in its choice is that tONma-x ^ max{ei). Fi¬ 
nally, G~^p ^{iojn) is the inverse non-interacting Green’s 
function of the Hamiltonian with a finite number of bath 
sites and is written as 


A. Fitting of the non-interacting hybridization 
function 

In exact diagonalization, the bath is replaced by a fi¬ 
nite number of sites (Af;,), each characterized by an onsite 
energy ei and hybridization with the impurity V). There¬ 
fore, the hybridization function is given by 

T/2 

AB£)(ia;„) = ^T-^-=>{£;, V)}, (4) 

^ lUJn — El 

which represents in real frequency the approximation of 
replacing a continuous function by a sum of poles and 
strengths. These poles and strengths {e;, Vi} have to be 


t/2 

= Z + H-J2 ■ ( 6 ) 

^ ^ > 

—Abd 

Therefore, we need to find the set of parameters {ei, Vi} 
that minimizes d. This is a problem of unconstrained op¬ 
timization in several variables. In DMFT, this is done as 
many time as necessary to converge the solution. We can 
define a bare hybridization function, for {7 = 0, a function 
containing the information about the crystal structure 
and chemistry of the problem. For simplicity, we choose 
to fix p, = 0. For the non-interacting case, ^{icon) = 0 
and the lattice Green’s function is given by the band dis- 
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persion only 

^Latticei'^^n) = ■ (7) 

^ tU/77, tk 

We can therefore define the bare hybridization function 

A°{ioJn) ='ioJn- ^ ■ (8) 

This can be fitted to {e°, VJ°} using Eq. (5) and gives a 
representation of the crystal of size equal to 2A^f, where 
is the number of bath sites. The function in this 
case is thus represented in this basis as A°(za;„) —>■ / = 
,V^,... ,V^J. It could perhaps be 
seen as a general and compact way to describe the lattice 
for ML, irrespective of how the database is constructed 
(using ED or not). After all, what is needed for cre¬ 
ating a D is a unique way to describe a problem and 
[{e°,V°},U, fj] provides one. This is true for both for 
model Hamiltonians and real materials as well as for so¬ 
lutions obtained from ED, quantum Monte Carlo etc. 

B. Particularities of the insulating state in ED 

The database contains 1783 converged DMET stan¬ 
dard metal solutions, 218 converged critical metal solu¬ 
tions and 494 Mott insulating solutions, obtained from 
exact diagonalization solutions of the DMET equations. 
In the Mott insulating state, physically, any choice of 
chemical potential in the gap should lead to the same so¬ 
lution with a shifted zero point of the frequency axis. 
However, the bath discretization of the ED method 
means that for otherwise identical parameters different 
values of n lead to different insulating solutions. Eor this 
reason we need to include many (here 494) different Mott 
insulators in the database. We train insulators using all 
these solutions. 

III. CROSS VALIDATION 
A. Principle and process 

In kernel ridge regression (KRR), there are two free 
parameters or hyperparameters. In our case, for AT, we 
use the weighted exponential kernel 

A(D„D)=e-^, (9) 

where |d/| = |D/i —Di|-|-|D /2 —D 2 I-I-... is the Manhattan 
distance between the two parameter sets in descriptors 
space and a gives the radius of effect that a particular 
point of the data set D/ will have in the prediction pro¬ 
cess. Therefore, the two free parameters are cr (entering 


the kernel function) and A, the regularization parame¬ 
ter used in the cost function that is minimized in KRR. 
To fix them, we use cross-validation. Cross validation 
proceeds by first creating a large number of pairs [cr, A]. 
Then, for each pair we randomly split the database so 
that the testing set contains about ten examples. For 
each of these (^ 10) tests, we predict only the first five 
Legendre polynomials coefficients G'/=o ...4 and calculate 
the total mean absolute error. Note that the actual met¬ 
ric to calculate the error is not really important, we are 
just looking for the pair that minimizes an error. We 
look for a pair [cr, A] that gives as small as possible error, 
then this [cr, A] is used to learn all the necessary Legen¬ 
dre polynomials coefficients and not only the first five. 
As an example, we show the result for the training of the 
metallic impurity Green’s function as a contour plot in 
Fig. 2 when the descriptor uses ED representation for the 
non-interacting hybridization function and Fig. 3 when 
the Legendre representation for the non-interacting hy¬ 
bridization function is chosen. The white color indicates 
regions where no data are available. We see that if the 
width cr of the kernel is too small (ML will use not enough 
solutions in descriptors spaces) or too large (ML will use 
too many far away solutions in descriptors spaces), the 
error is the largest. For the training/testing using solved 
problems from our database, the best possible a would 
be around but smaller than cr = I for Fig. 2 and around 
cr = 1.5 for Fig. 3. The results also show that the value of 
A is not extremely important. Practically we chose the 
pair [cr, A] that produced the smallest error among the 
ones tested while we increase the cr to prevent overfitting 
in the case of out of database prediction(see Section III.B 
below). 



log((j) 


Figure 2: (Color online) Contour plot showing the mean ab¬ 
solute error for the first five Legendre polynomials coefficients 
as a function of different hyperparameters pair [a, A] in cross- 
validation for the impurity Green’s function when the descrip¬ 
tor is chosen to be D = [{e®, Vj'’}, U, /rj. 
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Figure 3: (Color online) Contour plot showing the mean ab¬ 
solute error for the first five Legendre polynomials coefficients 
as a function of different hyperparameters pair [a, A] in cross- 
validation for the impurity Green’s function when the descrip¬ 
tor is chosen to be D = [{A°}, U, fj]. 


radius of effect for database points. How large will be 
given by the condition of still keeping a low enough er¬ 
ror for intra-database prediction. Hence by looking at 
Figs. 2 and 3 we see that values of a somewhere between 
log(cr) = 0.5 and log(CT) = 1 respect these conditions. We 
chose log(cr) = 1, but we could have tested many other 
values. 


- 0.5 



- 2.5 



B. Overfitting and prediction for a new problem 

Here we discuss in detail the prediction of the DMFT 
solution for the case t' = —0.16t, U = 2 and Ud 0.85 
which is a case completely outside our database. Let us 
first use the width of the kernel that gave the lowest er¬ 
ror during cross-validation using D = [{e°, V)°}, /r] as 

the descriptor which is around cr = 1 as shown is Sec¬ 
tion HI.A. The result for the imaginary part of the re¬ 
constructed impurity Green’s function is shown in Fig. 4- 
(a) while Fig. 4-(b) shows the ML predicted coefficients 
Gi- We see a systematic discrepancy in the prediction 
of the reconstructed Im{G(fw„)}, but it is interesting 
to realize that this is due to a systematic discrepancy 
of ^ 2% only on the even Gi coefficients at low order. 
This is true for every possible example at t' = —0.16f 
(we calculated a total of 295 different examples with this 
t'). The discrepancy is not due to a failing of ML but 
solely on a too tight choice of kernel width (over-fitting) 
which for example might not use enough of the solutions 
for t' = —O.lt and/or t' = —0.2t. We trained our ma¬ 
chine again, but with a larger width of cr = 10 which, 
according to Fig. 2 is not the optimal value but still give 
a pretty acceptable error. The result is Fig. 5 of the 
main text and is reproduced here as Fig. 4-(c) showing 
that we must be careful with over-fitting and if this is 
properly taken into account, we can predict new DMFT 
solutions. The specific choice of a is not well defined 
since out of database predictions means no comparison 
with exact results in principle. A sensible approach is to 
start from the a given by cross-validation of Section UFA 
and search for a larger value. To do so, contour plots of 
Figs. 2 and 3 are essential. We need a value to the right of 
the cross-validated one since we are looking for a larger 




Figure 4: (Color online) Machine Learning predicted im¬ 
purity Green’s function (black circles) with descriptor D = 
[{e/, Vi^, U, fi] as compared to the exact results (red dots) for 
17 = 2, F = —0.16t, rid ~ 0.85 (a) Prediction of Im{G(ia;„)} 
with a too small (over-htted) width of kernel (b) Prediction of 
Gi with a too small (over-htted) width of kernel (c) Prediction 
of Im{G(iiu„)} with a right width of kernel. 







































































IV. OBTAINING THE LEGENDRE 
COEFFICIENTS 


1 


4 


We expand the output solution in terms of Legen¬ 
dre polynomials. Considering the result as a function 
of imaginary time 0<r</3(/3 = l/Tis the inverse 
temperature) we have for correlation functions [1] 

/W = (10) 

1=0 ^ 

The ED method we used to do DMFT calculations 
provides correlation functions in energy space (real or 
Matsubara frequency); we must therefore Fourier trans¬ 
form the result. To perform the Fourier transform we 
note that the functions A(ia;„), GLattice{i^^n) 

and Tt{iujn) of interest here all decay as Cfiujn at large 
|a;„|. Therefore, let us define the general Matsubara 
frequency function f{iujn) with asymptotic behaviour 
f{iijJn)n=too Depending on the specific correla¬ 

tion function, the constant C is given by 


1 for G , 

[72^(1forS, 
Efe E/ for A . 

ED case 


( 11 ) 


The imaginary time function is given by Fourier trans¬ 
form /(t) = e“*‘^’’'^/(iw„). Direct numerical eval¬ 

uation of the sum is complicated by the slow 1 /uin decay. 
We therefore treat the l/ujn term analytically: 


G G ' 

f{iuJn) - - -h T- 

tUJn 


fir) = T^e— 

n 

= 2r^ |^Re{/(iw„)}cos(a;„T) 


( 12 ) 


n>0 


-I- (Im{/(za;„)} -|- ) sin(a;r!,T) 

_ C 

~ E' 


This way, we can obtain the value of /(t) for any de¬ 
sired T. Therefore, we can compute the coefficients of 
the expansion in Legendre polynomials as was done in 
[2] by using the algorithm based on Chebyshev-Legendre 
transform [3] exploiting the idea that smooth functions 
can also be represented by expansions in Chebyshev poly¬ 
nomials using fast Fourier transform. This algorithm is 
implemented in a free MATLAB toolbox called CHEB- 
FUN [4]. 

Once the coefficients have been obtained, not only 
the function in imaginary time can be reconstructed(lO), 
but the Fourier transform to /(ia;„) can also be done 
analytically [1] to give 


/(icc„) =^T„,/,, (13) 

i=0 






Figure 5: (Color online) Machine Learning predictions (black 
circles) for descriptor D = [{e^,VP},U, fi] and (blue x) for 
descriptor D = [{A)*}, U, p] as compared to the exact results 
(red dots), (a) Quasi-particle weight Z as a function of filling 
of the impurity for different U and t' (1) U = 0.64 t' = —0.3t, 
(2) U = 1.44 t' = -0.lt, (3) U = 2.08 t' = 0. (b) Lattice 
density as a function of the chemical potential (p) for different 
U and t' (1) U = 0.64 t' = -0.2t, (2) U = 1.44 t' = 0, (3) 
U = 2.08 t' = —O.lt (axis shifted /r -I- 0.2). 


where = (-l)"f'+V2Z -h Iji ^nd ji{z) are 

the spherical Bessel functions. 


V. CHOICE OF REPRESENTATION FOR THE 
BARE HYBRIDIZATION FUNCTION A°(icjn) 

In the main text we claimed that the results we pre¬ 
sented hold irrespective if the chosen representation of 
the bare hybridization function is from an ED-like fitting 
{e°, V)°} or as in term of Legendre polynomials {A°} ex¬ 
pansion. We show here two examples to support our 
statement. In Fig. 5-(a) and (b), we reproduce Fig. 2- 
(a) and Fig. 3-(a) of the main text. In addition, we add 
the predictions (blue x) as obtained from the descriptor 
of the form D = [{A°}, G,/r]. The predictions for the 
quasi-particle weight are very close to the case when the 
machine is trained with D = [{e°, V)°}, G, /x]. Other op- 
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timizations could be done to get even closer to the exact 
solutions and thus for intended purpose it can be con¬ 
sidered as being equal. The case of the lattice density 
is similar, the two representations give again practically 
the same answer. In conclusion, the two choices of rep¬ 
resentation for the bare hybridization function in the de¬ 
scriptor give essentially the same results and therefore its 
choice is a matter of which one can be obtained and how 
easy to calculate it is for a particular situation. 

VI. MEDIAN RELATIVE DIFFERENCE 

We present exactly how we calculated the median rel¬ 
ative difference for the quasi-particle weight. The same 
approach is taken for the density. Its meaning is the 
following: 1) A size Nieaming of learning set is chosen. 
2) From the database, we select randomly one example 
that will be the testing system. 3) From the remaining 
examples in the database, we randomly take Nieaming 
solutions and train a machine (calculate the a matrix of 
KRR). 4) We predict the self-energy for the testing exam¬ 
ple of 2) using our trained machine. From it, we obtain 
Z and can calculate the relative difference with the exact 
answer 100 We then repeat steps 3) and 4) 

twenty times to obtain the prediction of the same exam¬ 
ple from many different trained machines and thus assure 
we have large distributions of learning sets. Finally, we 


go back to step 2) and start again with a new example 
and do it fifty time in total to have a large distribution 
of testing examples. We therefore have one thousand rel¬ 
ative difference. If a randomly chosen learning set only 
contains examples very far from the testing example we 
are predicting, the relative difference will be large, but in 
that case, not because ML is bad, but rather because the 
learning set was badly picked. Therefore to have a good 
idea of how well a learning set of size Nieaming does, we 
argue that a good choice is to take the median value of 
the one thousand predictions. For the largest possible 
size of learning set 1782, we instead calculated the rela¬ 
tive difference for every example in the database choos¬ 
ing the reminder 1782 examples as learning set. Then 
we randomly choose fifty of these relative difference and 
calculated the median. 
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